The Datapreneurs by Muglia Bob;Hamm Steve;

The Datapreneurs by Muglia Bob;Hamm Steve;

Author:Muglia, Bob;Hamm, Steve;
Language: eng
Format: epub
ISBN: 7250357
Publisher: Skyhorse Publishing Company, Incorporated
Published: 2023-06-15T00:00:00+00:00


4 James C. Corbett, et al. “Spanner: Google’s Globally Distributed Database.” Google, Inc.,2012. https://static.googleusercontent.com/media/research.google.com/en//archive/spanner-osdi2012.pdf.

5 Daniel Abadi, Calvin. “Fast Distributed Transactions for Partitioned Database Systems.” Communications of the ACM, 2012. http://cs.yale.edu/homes/thomson/publications/calvin-sigmod12.pdf.

CHAPTER 8

UNLOCKING COMPLEX DATA

I wrote in Chapter 4 that “unstructured data” is a misleading term for describing video, still images, audio, music, books, and various text communications. While people easily comprehend these data types, computers struggle with them. Machine learning and foundation models will change that.

It makes much more sense to call these data types “complex” rather than “unstructured.” Complex data is a collection of many different, unique formats. Now, for the first time, we can train computers to understand these other formats, much as kids learn to read in elementary school and, as they progress, gain proficiency in understanding more complicated topics. As we develop a new generation of applications, a computer’s ability to handle and understand complex data types is vital. Once they can do this, we open a new world of applications that can extract information from these data sources, potentially combining them with other data types to reveal new insights.

The story of complex data begins for me with Bill Gates and his vision of “information at your fingertips.” Bill and others at Microsoft understood that putting all the world’s digital information at the people’s beck and call means that data must be discoverable by a computer. At that time, it was very early for video and audio on PCs. We thought mainly about image and text-based files. Of course, the applications that created and used text-based and image files understood the format of these files, but in general, the contents remained opaque.

The Cairo project’s developers tried to make complex files searchable, and a critical task in my role as program manager was supporting their efforts to do it. But, at that time, we were years away from having the machine learning mechanisms needed for the computer to understand the information within those files and make it available in response to queries.

Today, digital documents, photos, and videos are everywhere, creating a veritable tsunami of complex data. We have many new uses for it.

Decoding video, for example, has become critically important. There is a lot of focus on building drones, electric vehicles, and other robots that can operate autonomously. These systems use video cameras to understand the world around them. That requires the computer to analyze the video stream and understand what it sees. Processing this complex data is a particularly challenging problem because the robot may need to react quickly to what it sees. That requires an immediate understanding of what is happening. For example, a moving autonomous vehicle must know what to do if its camera detects a child running into the street ahead. These types of situations require processing complex data in real time.

New generations of foundation models dramatically accelerate our progress and open new worlds for previously impossible applications. Machine-learning algorithms made it easier for computers to identify information contained in video, audio, and still images to make it available.



Download



Copyright Disclaimer:
This site does not store any files on its server. We only index and link to content provided by other sites. Please contact the content providers to delete copyright contents if any and email us, we'll remove relevant links or contents immediately.